Regular Expression Matching Can Be Simple And Fast

Mark Leighton Fisher on 2008-01-11T17:06:26

Regular Expression Matching Can Be Simple And Fast is worth a look. re::engine::Plan9 should let you test this approach.


Dang, I wish I knew this a few years ago

btilly on 2008-01-12T16:06:42

I figured it out on my own and thought I had figured out something new.

It's a very nice article

bart on 2008-01-14T08:58:03

... but I wish their conclusion wasn't so bozo:

Regular expression matching can be simple and fast, using finite automata-based techniques that have been known for decades. In contrast, Perl, PCRE, Python, Ruby, Java, and many other languages have regular expression implementations based on recursive backtracking that are simple but can be excruciatingly slow.
... for pathetic expressions!

We're indeed talking about the regular expression /a?a?a?aaa/ which no experienced Perl code would ever use in real code.

Rewrite it, for example, as

/(?:a(?:a(?:a)?)?)?aaa/
and try again. (Caveat: I haven't benchmarked it myself.)

The problem isn't as much of a problem as these guys want to make you believe.

Re:It's a very nice article

Aristotle on 2008-01-14T20:16:50

Rewrite it, for example, as

/(?:a(?:a(?:a)?)?)?aaa/

Err, come again? Surely you mean “/a{3,6}/”?

Re:It's a very nice article

bart on 2008-01-15T20:19:38

Even better. :)

I'm just saying that every pathetic regexp can most likely be rewritten into something that is not pathetic witout much trouble — and possibly even be optimized.

Rewriting Regexes

Mark Leighton Fisher on 2008-01-18T17:39:28

I thought the article was important enough to post but I have to confess, I've never run into the problems shown in the article. I've even written my own (simple) regex matcher in C without running into those problems. Maybe this is an example of Jamie Zawinski's

Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems.

or Abraham Maslow's:

If you only have a hammer then you treat everything like a nail.

Sometimes, you really don't want to use regexes...